Clock skew compensation

ABSTRACT

A method and arrangement in a receiving communication device for compensating for the difference between the clock-frequency controlled sample rate of the receiving device and the sample rate of a sending communication device. The sending device transmits packets comprising M audio samples to be stored in a buffer in the receiving device accommodating at least 2·M samples before play-out. An estimation of the clock skew is continuously updated from a calculated accumulated difference between an expected and an actual point of time of reception of the M audio samples. Before play-out, an adjusted number N of audio samples to be read from the buffer before play-out is calculated using the estimated clock skew. Thereafter, the N audio samples are resampled by interpolation to M audio samples to play-out.

TECHNICAL FIELD

The present invention relates to a communicating device capable ofreceiving packets of audio samples, and more specifically to a methodand an arrangement in a receiving communicating device of compensatingfor a difference in the clock frequency between said receivingcommunicating device and a sending communicating device, the clockfrequency controlling the audio sample rate.

BACKGROUND

In a conventional circuit switched telephony network, each telephoneexchange receives a synchronization clock signal that is distributedhierarchically to every node in the network, thereby achieving asynchronized communication. However, such a hierarchical synchronizationis not always possible in a packet switched network, e.g. when personalcomputers communicate over the Internet.

In e.g. IP (Internet Protocol)-telephony, voice samples are forwardedfrom a sending communicating device to a receiving communicating device,and the latency, or delay, of the connection defines the time it takesfor a data packet to be transported between the sending communicatingdevice and the receiving communicating device. The packets are storedtemporarily in buffers in the nodes of the packet switched network, andthe varying storage time in the buffers leads to variations in thedelay, which is referred to as a delay jitter. While a circuit switchednetwork normally is designed to minimize the jitter, a packet switchednetwork is designed to maximize the link utilization by queuing thepackets in the buffers for subsequent transmission, which will add tothe delay jitter.

Protocols used to carry voice signals over the IP network are commonlyreferred to as VoIP (Voice over Internet Protocols), allowing a unifiednetwork to be used for multiple services. An incoming IP-phone call maybe automatically routed to an IP-phone located anywhere, and thereby auser is allowed to make and receive phone calls using the same phonenumber during travelling, regardless of location. However, VoIP involvesdrawbacks, such as delay, packet loss and the above-described delayjitter. The delay jitter may lead to buffer underrun, when a play-outbuffer runs out of voice data to play because the next voice packet hasnot arrived, but the consequences of the jitter are normally reduced bya de-jittering buffer located in the receiving communicating device. Thede-jittering buffer adds a variable extra delay before the audio samplesof the packet are played out, to keep the overall delay time constant,or slowly varying, in order to minimize the overall delay at some givenpacket loss rate depending on the current network conditions. Thereby,the occurrence of buffer underrun due to delay jitter may be avoided,but the overall delay is increased.

Additionally, the clock frequency controlling the sample reception in areceiving communicating device is not exactly the same as the clockfrequency of the sending communicating device, due to differencies ine.g. the quartz crystal oscillators of the clocks. The differencebetween the transmitting clock frequency, f_(Tx), and the receivingclock frequency, f_(Rx), of the samples is commonly referred to as clockskew. The accuracy A of a clock is often expressed in ppm (part permillion), and in existing IP-telephony connections, the clock skew isnormally less than 60 ppm (parts per million), but may in some casesreach 300 ppm. In a data packet containing M samples, the time period ofthe packet is M/f, and the actual difference between the packet timeperiod in the transmitter and in the receiver can be expressed asτ=(M/f_(Tx))−(M/f_(Rx)), which is sometimes called clock skew parameter,but is hereinafter referred to as the clock skew, τ, which may have apositive or a negative value.

The difference between the point of time indicated by the clock in thereceiver and the clock in the transmitter will accumulate over time, andcause problems. If the clock frequency of the transmitter is higher thanthat of the receiver, the clock skew, τ, is negative and the receiverwill continuously receive more samples than it is able to play outfollowing its own clock frequency, which will lead to an overrun of theplay-out buffer in the receiver. If, however, the clock frequency in thetransmitter is lower than in the receiver, the clock skew is positiveand the play-out buffer in the receiver will at certain interval run outof audio samples to play out, i.e. an underrun.

A receiver may have a play-out buffer accommodating only the samples ofone packet, and those samples are read from the buffer at play-out. If anew packet arrives before the previous packet has been played-out fromthe buffer, the packet will be written over before play-out, resultingin a packet slip. Similarly, if the data of the buffer is played-outbefore arrival of a packet, there is no data to read, which also willresult in a packet slip.

Thus, both overrun and underrun of the play-out buffer will cause apacket slip to occur at regular time intervals, when the accumulatederror in the expected packet arrival time reaches the packet time periodM/f_(Rx) of the receiving communicating device, of which M is the numberof audio samples in the packet. The time period between the packet slipsis inversely dependent on the size of the clock skew, since a largeclock skew will lead to more frequent packet slips. Following from theabove-described relationships, the mean value T_(PER) of the time periodbetween the packet slips may be calculated as the absolute value of1/(1−f_(Tx)/f_(Rx)). The influence of the jitter results in an actualtime period between the packet slips that varies around this mean value.Thus, the delay jitter and the clock skew will both contribute to asynchronization error. However, the effects of the delay jitter may beavoided by a de-jittering buffer, as described above, but the clock skewwill still result in overrun or underrun of the play-out buffer.

The effects of the clock skew can be reduced by a continuous adjustmentof the clock frequencies, e.g. by the use of GPS (Global PositioningSystem). However, this is not always possible, e.g. when the audiosample rate is controlled by an independent hardware clock in the audiocard of a standard personal computer, or when an IP network and a PSTN(Public Switched Telephone Network) are interconnected by a MediaGateway, in which case the play-out rate of the audio samples is alwayssynchronized with the PSTN clock. A so-called Media Gateway is commonlyused to connect different types of communication networks, and is ableto convert data from the format required for one type of network to theformat required for another.

Another method of compensating for the clock skew is by signalprocessing, e.g. by duplicating a sample value in the play-out buffereach time the receiver clock has gained one sample time relative thetransmitter clock, and to correspondingly delete one sample each timethe receiver clock has lost one sample time. However, this leads to adegradation in the quality of the play-out. A higher quality is achievedif the addition/deletion of a sample is performed during silenceperiods, but this is only satisfactory when the background is relativelysilent.

In Tõnu Trump: “Maximum Likelihood Trend Estimation in ExponentialNoise”, IEEE Transactions on Signal Processing, Vol. 49, No. 9,September 2001, pages 2087-2095, is addressed how to estimate a lineartrend in noise, and in particular how to derive a recursive algorithmfor estimating said clock skew, which may be used in real-timeapplications. Further, Tõnu Trump describes in “Compensation for clockskew in voice over packet networks by speech interpolation”, Proc. IEEEInternational Symposium on Circuits and Systems, Vol. 5, pp.V-608-V-611, May 2004, an algorithm for compensating for the clock skewby performing a more complex signal processing of the received audiosamples in a receiving communicating device. The algorithm performsresampling of the number of samples in the play-out buffer in thereceiver depending on an estimation of the clock skew, and theresampling involves interpolation of samples, preferably using splineinterpolation. Resampling is a process of changing the sampling rate ofa signal, either downsampling or upsampling, by dividing/multiplying thesampling rate with an appropriate resampling factor, and interpolationinvolves construction of additional samples from known samples. Whilelinear interpolation performed on known samples interpolates a linearfunction between the samples, spline interpolation uses low degreepolynomials in each of the intervals between the known samples. However,the above-described theories are difficult to implement in practicalcommunication systems, since they are adapted for complete test vectors,and can not be applied continuously on every received packet.

Thus, the clock skew still presents a problem in applications when theclocks cannot be synchronized, leading to packet losses and disturbancesin the audio content.

SUMMARY

The object of the present invention is to address the problems outlinedabove, and to provide efficient compensation for the clock skew. Thisobject and others are achieved by a method and an arrangement, accordingto the appended independent claims.

According to one aspect, a method is provided in a receivingcommunicating device of compensating for the difference between theclock-frequency controlled sample rate of the receiving communicatingdevice and of a sending communicating device. The sending communicatingdevice transmits packets comprising M audio samples to the receivingcommunicating device, and 0, 1 or 2 packets are received and stored in abuffer in the receiving communicating device, the buffer accommodatingat least 2·M audio samples. Further an estimation of the clock skew, τ,is continuously updated from a calculated accumulated difference betweenexpected and actual point of time of reception of said audio samples.Before play-out, an adjusted number N of audio samples to be read fromsaid buffer is calculated using said estimation of the clock skew, andthe adjusted number N audio samples are read from the buffer.Thereafter, the N audio samples are resampled to M samples to play-outby interpolation.

Said receiving communicating device may be an end-user terminal, a MediaGateway, or any other communicating device acting as a network node.

The method may comprise an additional step of controlling the number bof samples stored in the buffer, comprising a calculation of an adjustedvalue N_(lim) of samples to be read out from said buffer. Thereby,packet slips caused by e.g. estimation errors are further reduced.

N_(lim) may correspond to the number b of samples stored in the bufferwhen N is larger than b to prevent underflow of the buffer, and b−M whenb is larger than 2·M, to prevent overflow.

Alternatively, in order to avoid the need of a separate de-jitteringbuffer, N_(lim) may correspond to the number b of samples stored in thebuffer minus B₀ when N is larger than b−B₀, and to b−(B₀+M) when b islarger than B₀+2·M, of which the value of B₀ depends on T_(jitter) andof the clock frequency of the receiving communicating device. The valueof T_(jitter) is the selected time interval by which a packet is delayedbefore play-out to compensate for delay jittering, and it may be fixedor adaptive.

The interpolation method may comprise spline interpolation, such ascubic spline interpolation, and said packets may be IP-packets, i.e.packets encapsulated and transmitted according to the Internet Protocol.

According to another aspect, an arrangement is provided in a receivingcommunicating device of compensating for the difference between theclock-frequency controlled sample rate of the receiving communicatingdevice and of a sending communicating device. The sending communicatingdevice transmits packets comprising M audio samples to the receivingcommunicating device, and the compensation is performed before play-outof each packet. The arrangement comprises a buffer accommodating atleast 2·M samples for storing 0, 1 or 2 received packets, a clock skewestimating unit for continuously updating an estimation of the clockskew from a calculated accumulated time difference between expected andactual point of time of reception of the M audio samples, a convertingunit for calculating an adjusted number of N audio samples beforeplay-out to be read from said buffer from said estimation of the clockskew, and an interpolating unit for resampling by interpolation the Naudio samples read from the buffer to M audio samples to be played-out.

Said receiving communicating device may be an end-user terminal, a MediaGateway, or any other communicating device acting as a network node.

The arrangement may further comprise a limiting unit for controlling thenumber b of samples stored in the buffer by calculating of an adjustedvalue N_(lim) of samples to be read out from said buffer. N_(lim) maycorrespond to the number b of samples stored in the buffer when N islarger than b to prevent underflow of the buffer, and to b−M when b islarger than 2·M. Thereby, packet slips caused by e.g. estimation errorsare further reduced.

Alternatively, in order to avoid the need of a separate de-jitteringbuffer, N_(lim) may correspond to the number b of samples stored in thebuffer minus B₀ when N is larger than b−B₀, and to b−(B₀+M) when b islarger than B₀+2·M, of which the value of B₀ depends on T_(jitter) andof the clock frequency of the receiving communicating device.

The interpolation unit may be spline interpolation unit, and morespecifically a cubic spline interpolation unity, and said packets may beIP-packets, i.e. packets encapsulated and transmitted according to theInternet Protocol.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now be described in more detail and withreference to the accompanying drawings, in which:

FIG. 1 schematically illustrates two communicating devices communicatingover a packet switched network,

FIG. 2 is a graph illustrating the accumulation of the error in theexpected arrival time of a packet,

FIG. 3 schematically illustrates a receiver comprising a de-jitteringbuffer,

FIG. 4 is a block diagram illustrating the logical function of a clockskew compensating arrangement, and

FIG. 5 is a flow chart the clock skew compensation method according to afirst embodiment of the invention, and

FIG. 6 is a block diagram illustrating the second and third embodimentsof the clock skew compensating arrangement, comprising a limiting unit.

DETAILED DESCRIPTION

Thus, this invention compensates for the clock skew caused by different,non-synchronized clock frequencies in a sending communicating device anda receiving communicating device, which may be an end user communicatingdevice or a so-called telecommunication network node, such as e.g. aMedia Gateway. This is accomplished by compensating for the overrun andunderrun occurring in the play-out buffer in a receiving communicatingdevice by a compensation procedure, which is performed on each receivedpacket containing M audio samples, on packet-by-packet basis.

Initially, the compensation procedure performs a calculation of theaccumulated error in the expected arrival time of the M samples of 0, 1or 2 packets, and a continuous updating of an estimation of the clockskew. Thereafter, at the delivery of a packet, it performs a clock skewcompensating calculation of a different number N samples to be read fromthe play-out buffer from the estimated clock skew, and a resampling ofthe N read samples by the factor M/N by interpolation, thereby creatingM audio samples to be played-out, in which the clock skew is compensatedfor.

The resampling is a process where the incoming samples are used as abasis for creating a second set of samples played out at a differentrate, i.e. number of samples per time frame. This second set of samplesis synchronous with the receiving communicating device, meaning that therate at which the samples are consumed from the sending communicationdevice is controllable by the resampling process. Thereby, acompensation for the difference in the clock frequency between thesending communicating device and the receiving communicating device isachieved, and most of the packet slips caused by the different clockfrequencies can be eliminated.

The time delay introduced by the clock skew compensation according tothis invention is at least one packet period M/f_(Rx), since otherwisepacket slips would occur, e.g. in the case when f_(Tx) is smaller thanf_(Rx). Packet slips may still occur due to estimation errors, but thetime periods between the packet slips are increased considerably.

However, in order to reduce the packet slips caused by said estimationerrors, the actual number of samples, b, in the buffer may be monitoredand limited to be between 0 and 2·M. If more samples, N, are requestedto be read out from the buffer than the actual number of samples, b, inthe buffer, then only all the remaining samples will be read to avoidunderflow, and if the actual number of samples, b, in the buffer islarger than 2·M, then all remaining samples above M will be read toavoid overflow. This is achieved according to a second embodiment of theinvention by a limiting procedure involving a comparison between theactual number of samples b in the buffer and the calculated value of Nsamples to read from the buffer, and with the number 2·M, and by anadjustment of the number N of samples to be read out from the bufferaccordingly. If the value N is larger than the number b, then N isadjusted to correspond to b, and if b is larger than 2·M, then N isadjusted to correspond to b−M. Thereby, most of the remaining packetslips are eliminated, and the mean time delay is limited to M/f_(Rx).

According to a third embodiment of this invention, the clock skewcompensating procedure compensates for the delay jitter, as well, andthereby no additional conventional de-jittering buffer will be needed.This is accomplished by introducing an additional number of B₀ samplesin the buffer to compensate for the delay jitter, B₀≈f_(Rx)·T_(jitter),and limiting the number, b, of samples in the play-out buffer to bebetween B₀ and B₀+2·M, by an appropriate adjustment of the number ofsamples N to be read out. If N is larger than the actual number b minusB₀ samples in the buffer, then N is adjusted to b−B₀ to avoid underflowand to always keep B₀ samples in the buffer. Correspondingly, if b islarger than B₀+2·M, than N is adjusted to correspond to b−(B₀+M). Thecompensation procedure may be configured for any value of T_(jitter),causing no distortions as long as the packets come in order. Thereby,the configuration of the receiving communicating device is simplified,since no separate de-jittering buffer is required.

FIG. 1 shows two communicating devices communicating over a packetswitched network 3, both communicating devices normally capable offunctioning both as a transmitting communicating device and a receivingcommunicating device, by means of suitable hardware and software,according to the common general knowledge of the skilled person withinthe technical field. However, in this figure, one of the communicatingdevices 1 is denoted the transmitting communicating device, Tx, and theother communicating device, 2, is denoted the receiving communicatingdevice, Rx. The clocks of the transmitting communicating device and thereceiving communicating device are not synchronized, and an audio signalthat is sampled with the sampling frequency f_(Tx) in the transmittingcommunicating device 1 will be played-out with a slightly differentsampling frequency f_(Rx) in the receiving communicating device 2. Thetransmitting communicating device 1 sends packets to the receivingcommunicating device 2, each packet containing M audio samples, Mtypically being a multiple of 40, such as e.g. 40, 80 and 160, and eachof the audio samples of a packet is transmitted at the time instancest_(Tx) controlled by the clock frequency f_(Tx) of the transmittingcommunicating device 1. The packets are expected by the receivingcommunicating device at the expected time instances t_(Rx) controlled bythe clock frequency f_(Rx) of the receiving communicating device, towhich an expected transmission delay time, d_(Rx) has been added.However, the actual arrival time t_(R) of each packet will normally notcorrespond exactly to this expected arrival time, due to the delayjitter and the clock skew.

The difference between the actual and the expected arrival time of thepackets will also accumulate over time and can be defined asy(t)=t_(R)−t_(Rx). The contribution of the clock skew to y(t) is alinear function with the slope τ, and the contribution of the jitterv(t) adds a random contribution to y(t), as illustrated in FIG. 2. Thisgraph shows y(t)=a+τt+v(t), of which y(t)=t_(R)−t_(Rx), i.e. saiddifference between the actual and the expected arrival time of thepackets, the parameter a can be interpreted as a correction of theexpected transmission delay, and the clock skew, τ, controls the slope.

FIG. 3 schematically illustrates a receiving communicating device 2comprising a de-jittering buffer 4 according to prior art, which iscommonly used to solve the problem caused by the delay jitter. Packetscontaining M audio samples are input in the receiving communicatingdevice at an actual arrival time t_(R), and stored in the de-jitteringbuffer 4, which suitably comprises a FIFO (First-In First-Out)-bufferhaving the size M. The samples are read from the de-jittering buffer atthe scheduled play-out time, which is calculated by an addition of atime interval T_(jitter) to compensate for the jitter v(t). A largeradded T_(jitter) reduces the risk that a delayed packet will not beplayed-out, i.e. it reduces the risk for a packet slip, but it will alsoincrease the overall delay. Therefore, T_(jitter) is preferablyestimated such that ideally no sample will be read from the de-jitteringbuffer until after the actual arrival time, thereby compensating for thejitter, while at the same time introducing the smallest possible delay.However, since the difference between the expected and the actualarrival time y(t) caused by the contribution from the clock skew willaccumulate linearly in time when τ≠0, this can not be compensated for bythe de-jittering buffer, regardless of the size of T_(jitter), therebyresulting in packet slips due to overrun/underrun.

In order to solve the problem with the clock skew, this inventioninvolves a compensation for the overruns and underruns in the play-outbuffer caused by the clock skew.

FIG. 4 is a block diagram illustrating an arrangement in a receivingcommunicating device of performing clock skew compensation, according toa first embodiment of the invention. The packets containing M audiosamples are input in the receiver at the actual time instances t_(R) andstored in a play-out buffer 5, e.g. a FIFO-buffer, having a buffer sizeof at least 2·M, since 0, 1 or 2 packets may arrive in one packetperiod. Those time instances t_(R) are stored in a memory (not shown)until the packets are played out from the buffer. A clock skewestimation unit 6 calculates the accumulated difference y(t) between theactual arrival time t_(R) and the expected arrival time t_(Rx) of the Msamples using the stored time instances t_(R). Thereafter, the clockskew estimation unit updates the estimated clock skew, τ, accordingly.The value of the estimated clock skew can be calculated mathematicallyfrom t_(R)−t_(Rx) by any suitable clock skew estimating algorithm,preferably a recursive algorithm for implementation in real timesystems, e.g. a recursive maximum likelihood algorithm. The purpose ofthe clock skew estimating algorithm in this invention is to update thevalue of the estimated clock skew τ after each received packet of Maudio samples, using as input variable the cumulative difference betweenthe actual and the expected arrival time y=t_(R)−t_(Rx).

Before play-out of the packets, a converting unit 7 receives theestimated clock skew τ and calculates an adjusted number N of samples tobe read from the buffer 5 at the time instant t_(Rx) from said estimatedclock skew to compensate for the clock skew in order to avoid underrunor overrun of the buffer. This is achieved by scaling the number Maccording to the frequency ratio f_(Tx)/f_(Rx), by selecting a value Nthat is a result of the rounding of n=M·(f_(Tx)/f_(Rx)) to the closestinteger. However, the input to the converting unit will be an estimationof the clock skew, τ=(M/f_(Tx))−(M/f_(Rx)), and f_(Tx) is not known bythe receiving communicating device. Since n must be obtainable by thereceiving communicating device from 1 as an input value, and f_(Tx) isunknown to the receiver, n may e.g. be derived via the accuracy A, whichmay be expressed as A_(Tx)=−(f_(Rx)·τ)/(M+f_(Rx)·τ), from which n may becalculated as n=M²/(M+f_(Rx)·τ).

An interpolating unit 8 receives the value N from the converting unit 7,reads N samples from the buffer 5, and resamples the N samples byinterpolation to provide a packet containing M samples to be deliveredat the output. This resampling is equivalent with a change of thesampling rate by the factor M/N, and can be achieved with any suitabledigital signal processing technique. One exemplary method of resamplingis performed by first interpolating the signal with M−1 zeros after eachsample, thereafter providing low pass filters to avoid aliasing, i.e.that two different continuous signals become indistinguishable whensampled, and finally performing a decimation with the factor N. However,this method is complicated when the ratio M/N is close to 1. A preferredmethod of resampling by interpolation is to apply spline interpolation,which has a lower complexity when M/N is close to 1. Splines arepiecewise polynomials with pieces that smoothly connect together and azero order B-spline, β⁰(t), is a rectangular pulse, and an n-th orderB-spline is n+1 times a convolution of β⁰(t) with itself.

FIG. 5 is a flow chart illustrating the clock skew compensating methodaccording to a first embodiment of the invention, the first partcomprising the steps 500-520 of receiving a packet, which may beperformed 0, 1 or 2 times, and the second part comprising the steps525-530, which are performed when the receiving communicating device hasto deliver a packet.

In a first step 500, the receiving communicating device 2 receives apacket containing M audio samples at the actual time instances t_(R). Ina next step 505, the receiving communicating device stores the M audiosamples in a buffer 5 having a size of at least 2·M in order toaccommodate 0, 1 or 2 packets, the actual number of samples stored inthe buffer denoted by b, and the actual time instances are stored in amemory in step 510. In the next step 515, the accumulated timedifference between the actual time instances t_(R) and the expected timeinstances t_(Rx) is calculated, and in step 520 the value of theestimated clock skew τ is updated from this calculated difference. Theestimated clock skew is calculated by a suitable algorithm, e.g. arecursive maximum likelihood algorithm.

When the packets has to be delivered, i.e. played-out, the estimatedclock skew is converted, in step 525, to a number N of samples to beread from the buffer 5 at the time instances t_(Rx) in order tocompensate for the clock skew. The number N is obtained by a scaling ofthe number M with the ratio f_(Tx)/f_(Rx), using the relationshipN=M²/(M+f_(Rx)·τ), which follows from the definition of the clock skewτ=(M/f_(Tx))−(M/f_(Rx)). In step 530, N number of samples are read fromthe buffer 5, and in step 535, the N samples are resampled to M samplesby interpolation, e.g. by spline interpolation, or by any other suitabledigital resampling technique.

FIG. 6 shows an arrangement according to a second embodiment of theinvention, in which the clock skew estimating arrangement is providedwith a limiting unit 9 for controlling the actual number b of samplesstored in the buffer 5, keeping the number b of samples to be between 0and 2·M. This is implemented by a comparison between the calculatednumber N to be read from the buffer and the stored number b of samplesin the buffer. If the number N is larger than the stored number, b, ofsamples in the buffer, the number N is adjusted to N_(lim)=b, therebyonly b number of samples in the buffer will be read and underflow of thebuffer is prevented. If the number b of samples in the buffer is largerthan 2·M, all the samples above the number of M will be read, i.e.N_(lim)=b−M, thereby limiting the stored number of samples in the bufferand preventing overflow.

In prior art, a de-jittering buffer 4 may be used to compensate fordelay jittering by delaying each packet by a time period T_(jitter)before play-out. The value of T_(jitter) may be fixed or adaptive, andis selected as a compromise between the probability of packet slips andthe overall delay introduced by the de-jittering buffer.

However, according to a third embodiment of the invention, the buffer 5will be used as a de-jittering buffer, as well, to compensate for thedelay jitter. This is accomplished by arranging the limiting unit 9 tocontrol the buffer 5 to always keep at least B₀ samples in the buffer,and B₀ may be approximately f_(Rx)·T_(jitter), rounded to a higherinteger. Thus, the limiting unit 9 will compare the calculated number Nof samples to be read from the buffer with the number b of storedsamples in the buffer, and monitor the stored number b of samples inorder to keep the number of samples b to be not less than B₀ and notmore than B₀+2·M by adjusting the number N of samples to be readaccordingly. If the calculated number N of samples to be read is largerthan the actual number b of samples in the buffer minus B₀, the numberof samples N to be read out is adjusted to N_(lim)=b−B₀, in order toprevent underflow and to always keep B₀ samples in the buffer. If b islarger than B₀+2·M, than all the samples above the number of B₀+M willbe read, i.e. N_(lim)=b−(B₀+M), in order to limit the stored number ofsamples b in the buffer to prevent overflow.

Thereby, the introduced delay according to the third embodiment of theinvention will be (B₀+M)/f_(Rx), instead of M/f_(Rx), as in the firstand second embodiments. The sample number 0 to B₀ in the buffer are usedto compensate for the delay jitter, and the samples B₀ to B₀+2·M areused to compensate for the clock skew, and no separated de-jitteringbuffer is needed.

At the beginning of an operation, the error in the estimated clock skewτ may be significant due to too little available data. Therefore,according to a further embodiment of the invention, the output from theclock skew estimating unit is not used in the beginning of atransmission, e.g. during the reception of the first 50 frames.

While the invention has been described with reference to specificexemplary embodiments, the description is in general only intended toillustrate the inventive concept and should not be taken as limiting thescope of the invention.

1-24. (canceled)
 25. A method in a receiving communicating device ofcompensating for a difference between a clock-frequency controlledsample rate of the receiving communicating device and of a sendingcommunicating device, wherein the sending communicating device transmitspackets comprising M audio samples to the receiving communicatingdevice, said method comprising the steps of: receiving 0, 1, or 2packets comprising audio samples; storing the audio samples in a bufferaccommodating at least 2·M audio samples in the receiving communicatingdevice before play-out; storing the actual time of reception of each ofthe received audio samples; continuously updating an estimation of aclock skew (X) from a calculated accumulated difference between anexpected time of reception and the stored actual time of reception ofthe audio samples; before play-out, calculating an adjusted number N ofaudio samples to be read from the buffer using the estimation of theclock skew; reading N audio samples from the buffer; and resampling byinterpolation, the N audio samples read from the buffer to M samples toplay-out.
 26. The method in a receiving communicating device accordingto claim 25, wherein the method is performed in an end-user terminal.27. The method in a receiving communicating device according to claim25, wherein the method is performed in a Media Gateway.
 28. The methodin a receiving communicating device according to claim 25, furthercomprising controlling the number b of samples stored in the buffer. 29.The method in a receiving communicating device according to claim 28,wherein said step of controlling the number b of samples stored in thebuffer includes calculating an adjusted value N_(1im) of samples to beread out from the buffer.
 30. The method in a receiving communicatingdevice according to claim 29, wherein N_(1im) corresponds to the numberb of samples stored in the buffer when N is larger than b to preventunderflow of the buffer.
 31. The method in a receiving communicatingdevice according to claim 29, wherein N_(1im) corresponds to b minus Mwhen b is larger than 2 M.
 32. The method in a receiving communicatingdevice according to claim 29, wherein N_(1im) corresponds to the numberb of samples stored in the buffer minus the number B₀, when N is largerthan b−B₀, in order to compensate for delay jitter, the value of B₀depending on T_(jitter) and on the clock frequency of the receivingcommunicating device.
 33. The method in a receiving communicating deviceaccording to claim 32, wherein N_(1im) corresponds to b−(B₀+M) when b islarger than B₀+2·M.
 34. The method in a receiving communicating deviceaccording to claim 1, wherein the step of resampling by interpolationincludes spline interpolating the N audio samples.
 35. The method in areceiving communicating device according to claim 34, wherein the stepof spline interpolating comprises cubic spline interpolation.
 36. Themethod in a receiving communicating device according to claim 25,wherein the transmitted packets are IP-packets.
 37. An arrangement in areceiving communicating device for compensating for the differencebetween the clock-frequency controlled sample rate of the receivingcommunicating device and of a sending communicating device, wherein thesending communicating device transmits packets comprising M audiosamples to the receiving communicating device, said arrangementcomprising: a buffer accommodating at least 2 M samples for storing 0,1, or 2 received packets comprising audio samples; a memory for storingan actual time of the reception of each of the M audio samples; a clockskew estimating unit for continuously updating an estimation of theclock skew from a calculated accumulated time difference between anexpected time of reception and the stored actual time of reception ofthe M audio samples; a converting unit for using the estimation of theclock skew to calculate before play-out, an adjusted number of N audiosamples to be read from the buffer; and an interpolating unit forresampling by interpolation, the N audio samples read from the buffer toM audio samples to be played-out; wherein the arrangement performs thecompensation before play-out of each packet.
 38. The arrangementaccording to claim 37, wherein the receiving communicating device is anend-user terminal.
 39. Then arrangement according to claim 37, whereinthe receiving communicating device is a Media Gateway.
 40. Thearrangement according to claim 37, further comprising a limiting unitfor controlling the number b of samples stored in the buffer.
 41. Thearrangement according to claim 30, wherein the limiting unit includesmeans for controlling the number b of samples stored in the buffer bycalculating an adjusted value N_(1im) of 10 samples to be read out fromthe buffer.
 42. The arrangement according to claim 41, wherein N_(1im)corresponds to the number b of samples stored in the buffer when N islarger than b to prevent underflow of the buffer.
 43. The arrangementaccording to claim 41, wherein N_(1im) corresponds to b−M when b islarger than 2 M.
 44. The arrangement according to claim 41, whereinN_(1im) corresponds to the number b of samples stored in the bufferminus the number of samples B₀, when N is larger than b−B₀, in order tocompensate for delay jitter, the value of B₀ depending on T_(jitter) andon the clock frequency of the receiving communicating device.
 45. Thearrangement according to claim 44, wherein N_(1im) corresponds tob−(B₀+M) when b is larger than B₀+2·M.
 46. The arrangement according toclaim 37, wherein the interpolating unit includes means for performingspline interpolation of the N audio samples.
 47. The arrangementaccording to claim 46, wherein the means for performing splineinterpolation performs cubic spline interpolation.
 48. The arrangementaccording to claim 37, wherein the transmitted packets are IP-packets.